Preparing the data for PCA

Sampling from dataframe

PCA embedding

It looks like our PCA model is able to distinguish between some categories. In fact, as we will see later on, it is very efficient in clustering the platelets (orange points).

Plotting the previous scatter plot with annotated images for our data

Just as we said previously, the PCA has been able to distinguish between categories according to the size of the bloodcells in the images, which makes it really useful for detecting platelet.

LDA embedding

Amazing ! Once again, we have been able to visually cluster some categories according to the bloodcell size.

t-SNE embedding

UMAP emdedding

While t-SNE is really good at conserving local distance, PCA is good at conversing global structure. The UMAP algorithm is good at maintaining both. Let's have a look.

Applying our UMAP model

As with the other methods of dimension reduction, the platelets form a single cluster. We can also see that some of the cells appears to be in sub-cluster within the main cluster.

We can check whethet the UMAP is performing good on the global distances using the PCA diagnosis tool.

The plot that the UMAP was able to integrate global variance within its dimension reduction.

UMAP on a subsample

Let's plot the image on a subsample fitted with UMAP

We can clearly see here that the platelets stand out, moreover the UMAP is able to group other cells into subclusters and UMAP component 2 appears to differentiate cells by size and brightness.

Supervised UMAP

We can also try to supervised UMAP with our labels

Conclusion

Using dimension reduction, we found consistent separation of platelets from the rest of the cell. Platelets thus appears as an easy cell to classify. Those results are in agreements with the data from the luminance and deconvolution exploration. The UMAP revealed that the cells tend to cluster together within an important cluster. On the PCA and UMAP, we could also observed that the size of cells and the brightness contribute to the global variance.

The supervised UMAP is an interesting model for dimension reduction as it manage to truly separate each cell groups. We will use it for our base model.